Cross-lingual alignments of ELMo contextual embeddings

نویسندگان

چکیده

Building machine learning prediction models for a specific NLP task requires sufficient training data, which can be difficult to obtain less-resourced languages. Cross-lingual embeddings map word from language resource-rich so that model trained on data the also used in language. To produce cross-lingual mappings of recent contextual embeddings, anchor points between embedding spaces have words same context. We address this issue with novel method creating alignment datasets. Based that, we propose several mapping methods ELMo embeddings. The proposed linear use existing Vecmap and MUSE alignments Novel nonlinear ELMoGAN are based GANs do not assume isomorphic spaces. evaluate nine languages, using four downstream tasks: named entity recognition (NER), dependency parsing (DP), terminology alignment, sentiment analysis. perform very well NER tasks, lower loss compared direct some In DP analysis, variants more successful.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments

While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Mode...

متن کامل

Cross-lingual Wikification Using Multilingual Embeddings

Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wiki...

متن کامل

Trans-gram, Fast Cross-lingual Word-embeddings

We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compute aligned wordembeddings for twenty-one languages using English as a pivot language. We show that some linguistic features are aligned across lang...

متن کامل

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typologically different language pairs. Our evaluation setup spans...

متن کامل

A Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings

Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word e...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Neural Computing and Applications

سال: 2022

ISSN: ['0941-0643', '1433-3058']

DOI: https://doi.org/10.1007/s00521-022-07164-x